Policy Gradient Method for Team Markov Games

نویسنده

  • Ville Könönen
چکیده

The main aim of this paper is to extend the single-agent policy gradient method for multiagent domains where all agents share the same utility function. We formulate these team problems as Markov games endowed with the asymmetric equilibrium concept and based on this formulation, we provide a direct policy gradient learning method. In addition, we test the proposed method with a small example problem.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning of Soccer Player Agents Using a Policy Gradient Method: Pass Selection

This research develops a learning method for the pass selection problem of midfielders in RoboCup Soccer Simulation games. A policy gradient method is applied as a learning method to solve this problem because it can easily represent the various heuristics of pass selection in a policy function. We implement the learning function in the midfielders’ programs of a well-known team, UvA Trilearn B...

متن کامل

Rational and Convergent Model-Free Adaptive Learning for Team Markov Games1

In this paper, we address multi-agent decision problems where all agents share a common goal. This class of problems is suitably modeled using finite-state Markov games with identical interests. We tackle the problem of coordination and contribute a new algorithm, coordinated Qlearning (CQL). CQL combines Q-learning with biased adaptive play, a coordination mechanism based on the principle of f...

متن کامل

Learning to Cooperate via Policy Search

Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environm...

متن کامل

UvA Rescue Team Description Paper Agent competition Rescue Simulation League RoboCup 2014 - João Pessoa - Brazil

The contribution of the UvA Rescue Team is an attempt to lay a theoretical foundation by describing the planning and coordination problem formally as an POMDP problem, which will allow to apply POMDP-solution methods in this application area. To be able to solve the POMDP problem for large state spaces and long planning histories, our team has chosen for an approximation of the Dec-POMDPs throu...

متن کامل

Policy-Gradient Algorithms for Partially Observable Markov Decision Processes

Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004